Using Kart and GitHub for versioning and collaborating with spatial data in archaeological research

Archeo.FOSS 17 (Turin, 12-13 December)

Andrea Titolo

University of Turin

Alessio Palmisano

University of Turin

Talk overview

  • Introduction
    • Open Science and version control in archaeology
    • Git and limitation
  • Git for geospatial data
    • Description and features of Kart
  • Practical applications of kart in archaeology
    • Description of the project
    • How we are using kart
  • Thoughts and conclusions

Scan and follow the presentation on your phone!

Introduction

1 2 3 4

Open Science and transparency of the process

1 2 3 4

  • One (of many) aim Open Science: opening and transparency of process behind data creation and results
  • “Data must have history” Strupler and Wilkinson (2017)

Wallis (2022)

Version control

1 2 3 4

  • Transparent process trough “snapshots” at different stages
  • Easy roll-back to previous versions if something goes wrong
  • Provides a solution to the multiple iterations of correction and renaming of the same file
  • Greater accountability and better documentation (Kansa 2012)
  • Enhances Open Science practices (Marwick 2017)

Source: xkcd

Git

1 2 3 4

  • Distributed version control system
  • Originally developed to track changes in the linux kernel
  • Adapted also to non-coding environment
  • Git is still not a user friendly software
  • Graphical frontends do not always help

Source: xkcd

Distributed version control and archaeology

1 2 3 4

  • Archaeology has come a long way in adopting version control
  • Used mainly in the programming/scripting applications and publication
  • Some attempts to adapt it to fieldwork practices


Source: Strupler and Wilkinson (2017: 5)

Source: Strupler and Wilkinson (2017: 4)

Git and binary files

1 2 3 4

  • Binary files: images, word documents, excel files
  • Git is not as efficent with binary files as it is with plain text (save the entire file every time)
  • Storage issues, harder to track changes
  • For text files, plain text can sometimes be the answer, but what about GIS and relational databases?


What about geospatial data?

1 2 3 4

  • In GIS, research process is often obscured by the point-and-click nature of the GUI
  • QGIS models can surely help reproducibility of some analyses
  • Scripts for data cleaning

For many in archaeology, for whom using GIS to visualise results is essentially a graphical-based point and-click process, advocating a return to code may seem like a backward step. We understand the arguments for usability, and acknowledge that intermediate tools which can bridge point-and-click with code-based approaches are desperately required.

Strupler and Wilkinson (2017)

Git for geospatial data

1 2 3 4

Git for distributed version control of geospatial data

1 2 3 4


Kart features

1 2 3 4

  • Works with different file formats: Geopackage, PostgreSQL/PostGIS, MySQL, MSSQLS
  • Support most geospatial data types: Vectors, Raster, Point Clouds, Lidar, etc.
  • Planned support for shapefiles
  • “Built on git, works like git”

Kart features

1 2 3 4

  • Track changes at the row and cell layer level
  • Command Line Interface tool
  • Standard git workflow
    • kart status
    • kart add
    • kart commit
    • kart pull
    • kart push
    • kart log
    • kart switch/branch
  • Scriptable

Kart QGIS Plugin

1 2 3 4

  • QGIS plugin offers a Graphical User Interface
  • All the kart commands are available
  • Visual tool to inspect changes

Remote Collaboration

1 2 3 4

  • Host data in remote repositories
  • Compatible with all qgis styles
  • Potential to mitigate issues regarding data sharing

Kart for archaeology

1 2 3 4

Project presentation

1 2 3 4

Project presentation

1 2 3 4

Dataset

1 2 3 4

  • Still under digitalization
  • 2065 Sites collected so far
  • 5684 Occupation phases

Dataset organization

1 2 3 4

  • QGIS attribute table
  • QGIS form
    • General
    • Archaeological
    • Geospatial
    • References
  • Background tables
  • All versioned in kart

Project structure

1 2 3 4

  • Organization on GitHub
  • Project actions treated as GitHub issues
  • Different repositories depending on data
  • Granular control of licenses, publications, repo access

Using Kart in our project

1 2 3 4

  • Relatively simple workflow
  • Two main uses
  • Collaboration between project members
    • Simple git workflow
    • Different branches for each person, pushing and merging to main
  • Keeping track of dataset change
    • Transparency of the process
    • File (and methods) history
    • Inspect beyond the final product

Using Kart in our project - issues

1 2 3 4

  • Not many issues until now (few people)
  • Collaboration tested on two MacOS (13-Ventura and 12-Monterey), issues with MacOS 11-Big Sur
  • Kart tested also on Ubuntu-based Linux (Pop!_OS)
  • Conflicts with primary keys when working with Geopackage

Using Kart in our project

1 2 3 4

  • Public project wiki
  • How to use the dataset and how to use kart
  • Tips to solve common issues
  • Methodology and convetions
  • Internal use and external reference
  • Updated as the project proceed

Conclusions

1 2 3 4

Conclusions

1 2 3 4

Advantages

  • Git-based tool
  • Graphical solution for those unfamiliar with git
  • Fieldwork (no internet connection needed unless you push changes to remote)
  • Kart can fit well into archaeological Open Science practices
  • More transparency both during and after data creation process
  • Lack single file to download from online repositories (site stewardship)

Disadvantages

  • Not an easily accessible tool
  • Graphical interface still need more work
  • Solving primary key conflicts requires the command-line
  • Documentation is still catching up with recent development
    • Contribution to upstream from our wiki

Thank you!

Andrea Titolo (andrea.titolo@unito.it)

Alessio Palmisano (alessio.palmisano@unito.it)


Interactive Slides


Works Cited

Bar, S. and Zertal, A. (2021). The Manasseh Hill Country Survey Volume 6: The Eastern Samaria Shoulder, from Nahal Tirzah (Wadi Far’ah) to Maale Ephraim Junction, Brill.
Bar, S. and Zertal, A. (2022). The Manasseh Hill Country Survey Volume 7: The South-Eastern Samaria Shoulder, from Wadi Rashash to WadiAujah, Brill.
Coup, R. (2022a). Kart: An introduction to practical data versioning for geospatial.
Coup, R. (2022b). Kart: A Practical Tool for Versioning Geospatial Data.
Coup, R. (2023). 2023 QGIS Data Versioning with Kart - Robert Coup.
Finkelstein, I., Lederman, Z. and Bunimovitz, S. (1997). Highlands of many cultures: The Southern Samaria survey ; the sites, Institute of Archaeology of Tel-Aviv University, Publications Section.
Kansa, E. (2012). Openness and Archaeology’s Information Ecosystem. World Archaeology 44: 498–520.
KartContributors (2023). Kart geospatial data version-control software.
Kloner, A. (2000). Survey of Jerusalem The Southern Sectors.
Marwick, B. (2017). Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24: 424–450.
Olaya, V. (2022). Spatial data versioning with the Kart QGIS Plugin with Victor Olaya.
Strupler, N. and Wilkinson, T. C. (2017). Reproducibility in the field: Transparency, version control and collaboration on the project panormos survey.
Wallis, K. (2022). Open Science: A practical guide for PhD students, University College London.
Zertal, A. (2004). The Manasseh Hill Country Survey, Volume 1: The Shechem Syncline, Brill.
Zertal, A. (2007). The Manasseh Hill Country Survey, Volume 2: The Eastern Valleys and the Fringes of the Desert, Brill.
Zertal, A. and Bar, S. (2017). The Manasseh Hill Country Survey Volume 4: From Nahal Bezeq to the Sartaba, Brill.
Zertal, A. and Bar, S. (2019). The Manasseh Hill Country Survey Volume 5: The Middle Jordan Valley, from Wadi Fasael to Wadi Aujah, Brill.
Zertal, A. and Mirkam, N. (2016). The Manasseh Hill Country Survey: Volume 3: From Nahal Iron to Nahal Shechem, Brill.